Section: New Results
Statistical learning methods for high-dimensional data
-
Genuer R, Poggi J-M, Tuleau-Malot C, Villa-Vialaneix N. Random Forests for Big Data. Big Data Research, 9 (2017). [18]
Addresses the analysis of Big Data with Random Forests, review of existing algorithms, simulation study and recommandations.
-
Agniel D and Hejblum BP, Variance component score test for time-course gene set analysis of longitudinal RNA-seq data, Biostatistics, 18(4):589–604, 2017.[16]
We propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods
-
Hejblum BP, Alkhassim C, Gottardo R, Caron F, Thiébaut R. Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data, preprint on ArXiv. [39]
We propose to use a Bayesian nonparametric approach with Dirichlet process mixture of multivariate skew t-distributions to perform model based clustering of flow-cytometry data, robustly estimating the number of cell populations from the data.